103 research outputs found
Mapalester:Powerful, East-to-Use GIS Software Under Development
Many non-profits and K-12 schools would benefit from a GIS, but cannot afford expensvie GIS software. I have worked on developing a new GIS software package aimed at these and other organizations and individuals with small budgets. The software is particularly focused on ease of use and centralization, especially with regard to spaital analysis operations and date retrieval/organization. In the paper, I discuss my progress on the software, as well as the major problems I have encountered in developing it
Measuring the Importance of User-Generated Content to Search Engines
Search engines are some of the most popular and profitable intelligent
technologies in existence. Recent research, however, has suggested that search
engines may be surprisingly dependent on user-created content like Wikipedia
articles to address user information needs. In this paper, we perform a
rigorous audit of the extent to which Google leverages Wikipedia and other
user-generated content to respond to queries. Analyzing results for six types
of important queries (e.g. most popular, trending, expensive advertising), we
observe that Wikipedia appears in over 80% of results pages for some query
types and is by far the most prevalent individual content source across all
query types. More generally, our results provide empirical information to
inform a nascent but rapidly-growing debate surrounding a highly-consequential
question: Do users provide enough value to intelligent technologies that they
should receive more of the economic benefits from intelligent technologies?Comment: This version includes a bibliography entry that was missing from the
first version of the text due to a processing error. This is a preprint of a
paper accepted at ICWSM 2019. Please cite that version instea
Behavioral Use Licensing for Responsible AI
Scientific research and development relies on the sharing of ideas and
artifacts. With the growing reliance on artificial intelligence (AI) for many
different applications, the sharing of code, data, and models is important to
ensure the ability to replicate methods and the democratization of scientific
knowledge. Many high-profile journals and conferences expect code to be
submitted and released with papers. Furthermore, developers often want to
release code and models to encourage development of technology that leverages
their frameworks and services. However, AI algorithms are becoming increasingly
powerful and generalized. Ultimately, the context in which an algorithm is
applied can be far removed from that which the developers had intended. A
number of organizations have expressed concerns about inappropriate or
irresponsible use of AI and have proposed AI ethical guidelines and responsible
AI initiatives. While such guidelines are useful and help shape policy, they
are not easily enforceable. Governments have taken note of the risks associated
with certain types of AI applications and have passed legislation. While these
are enforceable, they require prolonged scientific and political deliberation.
In this paper we advocate the use of licensing to enable legally enforceable
behavioral use conditions on software and data. We argue that licenses serve as
a useful tool for enforcement in situations where it is difficult or
time-consuming to legislate AI usage. Furthermore, by using such licenses, AI
developers provide a signal to the AI community, as well as governmental
bodies, that they are taking responsibility for their technologies and are
encouraging responsible use by downstream users
Demographic Inference and Representative Population Estimates from Multilingual Social Media Data
Social media provide access to behavioural data at an unprecedented scale and
granularity. However, using these data to understand phenomena in a broader
population is difficult due to their non-representativeness and the bias of
statistical inference tools towards dominant languages and groups. While
demographic attribute inference could be used to mitigate such bias, current
techniques are almost entirely monolingual and fail to work in a global
environment. We address these challenges by combining multilingual demographic
inference with post-stratification to create a more representative population
sample. To learn demographic attributes, we create a new multimodal deep neural
architecture for joint classification of age, gender, and organization-status
of social media users that operates in 32 languages. This method substantially
outperforms current state of the art while also reducing algorithmic bias. To
correct for sampling biases, we propose fully interpretable multilevel
regression methods that estimate inclusion probabilities from inferred joint
population counts and ground-truth population counts. In a large experiment
over multilingual heterogeneous European regions, we show that our demographic
inference and bias correction together allow for more accurate estimates of
populations and make a significant step towards representative social sensing
in downstream applications with multilingual social media.Comment: 12 pages, 10 figures, Proceedings of the 2019 World Wide Web
Conference (WWW '19
- …